<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>LingPipe: Customers and Research Use</title>
<meta http-equiv="Content-type"
      content="application/xhtml+xml; charset=utf-8"/>
<meta http-equiv="Content-Language"
      content="en"/>
<link href="css/lp-site.css"
      title="lp-site"
      type="text/css"
      rel="stylesheet" />
</head>

<body>

<div id="header">
<h1 id="product">LingPipe</h1><h1 id="pagetitle">Customers</h1>
<a id="logo"
   href="http://alias-i.com/"
  ><img src="img/logo-small.gif" alt="alias-i logo"/>
</a>
</div><!-- head -->


<div id="navig">

<!-- set class="current" for current link -->
<ul>
<li><a href="../index.html">home</a></li>

<li><a href="demos.html">demos</a></li>

<li><a href="licensing.html">license</a></li>

<li>download
<ul>
<li><a href="download.html">lingpipe core</a></li>
<li><a href="models.html">models</a></li>
</ul>
</li>

<li>docs
<ul>
<li><a href="install.html">install</a></li>
<li><a href="../demos/tutorial/read-me.html">tutorials</a></li>
<li><a href="../docs/api/index.html">javadoc</a></li>
<li><a href="book.html">textbook</a></li>
</ul>
</li>

<li>community
<ul>
<li><a class="current" href="customers.html">customers</a></li>
<li><a href="http://groups.yahoo.com/group/LingPipe/">newsgroup</a></li>
<li><a href="http://lingpipe-blog.com/">blog</a></li>
<li><a href="bugs.html">bugs</a></li>
<li><a href="sandbox.html">sandbox</a></li>
<li><a href="competition.html">competition</a></li>
<li><a href="citations.html">citations</a></li>
</ul>
</li>

<li><a href="contact.html">contact</a></li>



<li><a href="about.html">about alias-i</a></li>
</ul>

<div class="search">
<form action="http://www.google.com/search">
<p>
<input type="hidden" name="hl" value="en" />
<input type="hidden" name="ie" value="UTF-8" />
<input type="hidden" name="oe" value="UTF-8" />
<input type="hidden" name="sitesearch" value="alias-i.com" />
<input class="query" size="10%" name="q" value="" />
<br />
<input class="submit" type="submit" value="search" name="submit" />
<span style="font-size:.6em; color:#888">by&nbsp;Google</span>
</p>
</form>
</div>

</div><!-- navig -->


<div id="content" class="content">

<h2>LingPipe Customers</h2>

<div class="sidebar">
<h2>Dual-License, Dual-Use</h2>
<p>
LingPipe's <a href="download.html">dual royalty-free/commercial
license</a> permits applications in both academia and industry.
</p>
<p>
Commercial applications drive efficiency, robustness, and scalability.
Research users drive breadth and customizability.  Both require
accuracy and usability.  And both submit feature and package
requests.
</p>
</div>

<p>
After its October 2003 release, LingPipe was adopted by commercial
customers ranging from the defense and health industries to Web 2.0 
startups.  LingPipe is also widely used in the academic community
for both research and teaching.
</p>

<p>
Customers are listed in the following sections:
</p>

<ul style="padding-bottom: 1em">
<li><a href="#commercial">Commercial Customers</a></li>
<li><a href="#patron">Research Patrons</a></li>
<li><a href="#academic">Academic: Research</a></li>
<li><a href="#teaching">Academic: Teaching</a></li>
</ul>



<a name="commercial"></a>
<h2>Commercial Customers</h2>


<h3><a href="http://www.thomson.com/corp/about/mg_lg/ab_mg_lg_overview.jsp">Thomson Legal and Regulatory</a></h3>

<p>Thomson provides the Westlaw legal search engine.
Several projects in development.
</p>

<h3><a href="http://www.edgaronline.com">EdgarOnline</a></h3>

<p>
EdgarOnline provides financial reports and news search and data aggregation service.
Project in development.
</p>

<h3><a href="http://www.technorati.com/">Technorati</a></h3>

<p>
Technorati provides a blog search, tagging and syndication service.
Project in development.
</p>


<h3><a href="http://www.nielsenbuzzmetrics.com">Nielsen Buzzmetrics</a></h3>

<p>
The Buzzmetrics group at Nielsen measures consumer-generated media.
We did language model driven sentiment analysis for brands in blog
data.  Project deployed.
</p>


<h3><a href="http://www.dod.gov/">U. S. Department of Defense</a></h3>
<p>
Provided the ThreatTracker application based on LingPipe.  We deployed
several applications for training and evaluation. We estimate that
over 200 intelligence analysts were trained in the use of
ThreatTracker. 
</p>

<h3><a href="http://www.mitre.org/">Mitre</a></h3>

<p> We released a daily <a href="doc/Osama.ppt">Osama bin Laden tracker</a> for
Mitre's MiTAP product which was used world wide by intelligence
analysts and other government offices for years. Derivative products
included a &quot;Top Ten Terrorist Suspects&quot; tracker as well as a
product that tracked infectious disease outbreaks in FBIS and open
source data feeds.  The system was in production for two years.
</p>
<ul><li><span class="smallnote">
Damianos, L., Ponte, J., Wohlever, S., Reeder, F., Day, D., Wilson, G., Hirschman, L. 2002.
<a href="doc/MiTAP_AImag2002.pdf">
MiTAP for Bio-Security: A Case Study</a>
<i>AI Magazine</i> <b>23</b>(4):13-29.
</span></li></ul>

<h3><a href="http://www.endeca.com">Endeca</a></h3>
<img style="float:right; padding:1em" src="http://www.endeca.com/Collateral/Images/English-US/Partners_logos/endecaExtend.gif" alt="endeca logo"/>

<p>We supply search faceting technology based on noun phrase
extraction in French and English for Endeca.</p>
<p>
In 2009 we joined the <a
href="http://www.endeca.com/partners-technology-partners-extended-partner-program.htm">Endeca
Extend Program</a>, which exposes LingPipe functionality as a plug-in
to Endeca's faceted search engine.  </p>





<a name="patron"></a>
<h2>Research Patrons</h2>
<h3><a href="http://nlm.nih.gov/">U. S. National Library of Medicine</a></h3>

<div class="sidebar">
<h2>Startup without Venture Capital</h2>
<p>
Alias-i is wholly employee-owned.  There was no venture-capital
funding at all.
</p>
<p>
For researchers who (a) have the academic credentials to receive
research grants, and (b) want to turn their research ideas into
commercial products, government research grants are ideal.
</p>
<p>
Alias-i continues to apply for research grants as a
complementary source of funding to commercial customers.
</p>

</div>

<p>
Alias-i pitched an improved ThreatTracker-like application for
bioinformatics and received a two year small business innovation
research (SBIR) grant to develop one from NLM, a part of the <a
href="http://nih.gov">National Institutes of Health</a> (NIH).
</p>
<p>This project has driven most of LingPipe's API development recently,
including confidence ranked entities and part-of-speech as well
as approximate dictionary matched extraction and classification.
</p>
<p>Application deliverables include LingBlast, a cross-document coreference
application that linked 40,000 human genes to 15 million MEDLINE
abstracts using language models to improve search results over expanded
aliases, the relation extraction mechanism described in the database
tutorial.  Development of LingArray, a literature assay and visualization
tool similar to micro-arrays is the final deliverable.
</p>


<h3><a href="http://www.darpa.mil">U. S. Defense Advanced Research Project Agency</a></h3>

<p>
Three years of seed funding came through DARPA's <a
href="http://www.darpa.mil/ipto">Information Processing Technology
Office</a> through the <a href="http://en.wikipedia.org/wiki/DARPA_TIDES_program">Translingual Information Extraction
and Summarization</a> (TIDES) program.  Alias-i explored applications
of cross-document coreference resolution to search, tracking and
relationship mining.  (Coreference involves linking mentions of objects
in text to their real world referents and/or to other mentions with
the same referent.)
</p>
<p>Pre-release versions of LingPipe were deployed as part
of our ThreatTracker product. Check out a screen shot of our <a
href="doc/Translingual.ppt">translingual ThreatTracker</a> prototype.
</p>


<a name="academic"></a>
<h2>Academic Research</h2>

<div class="sidebar">
<h2>Bring on the Grad Students</h2>
<p>We love working with graduate students.  It's the one thing
we really miss about academia.  
</p>
<p>Recreating common software tools is not only time-consuming and
error prone, it's typically not the focus of a research project. 
LingPipe makes student projects easier by providing usable interfaces
for most of the common components of the &quot;linguistic
pipeline&quot; module of an application.  
</p>
<p>
Our builds, release organization and code provides documented,
well-tested examples of linguistic coding, which has itself
proved useful for students learning how to build things like
LingPipe themselves.
</p>
</div>

<p>
We are amazed at the quantity, creativity and quality of work done
with LingPipe in fields ranging from bioinformatics to blog processing
experiments.  Below are some selected publications with a brief
description of how LingPipe was used and/or quotes.
</p>


<div class="sidebar">
<h2>Your Project Here</h2>
<p>
If you know of a research project or paper using LingPipe that is
not listed on this page, please <a href="contact.html">contact us</a>.
Ideally, send us a writeup just like this one and you'll see it in the
next release.
</p>
</div>

<h3><a href="http://mitre.org">MITRE</a>, <a href="http://brandeis.edu">Brandeis University</a></h3>
<p>LingPipe is used in concert with another named entity recognition tool (Carafe) to do de-identification of patient records. De-identification means remove all
patient specific information from the text.
</p>
<ul><li><span class="smallnote">
Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L. 2007.
<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&amp;Cmd=ShowDetailView&amp;TermToSearch=17600096&amp;ordinalpos=1&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum">Rapidly Retargetable Approaches to De-identification in Medical Records</a> <i>Journal of the American Medical Informatics Association Online</i>
</span>
</li></ul>

<h3><a href="http://www.unt.edu">University of North Texas</a></h3>
<p>
LingPipe is used for sentence detection for the TREC conference, and
perhaps coreference resolution (not entirely clear from paper).
</p>
<ul><li><span class="smallnote">
J. Chen, H. Ge, Y. Wu, S. Jiang.
2004.
<a href="http://scholar.google.com/url?sa=U&amp;q=http://trec.nist.gov/pubs/trec13/papers/unorthtexas.qa.pdf">UNT at TREC 2004: Question Answering Combining Multiple Evidences</a>.  <i>TREC Proceedings</i>.
</span></li></ul>

<h3><a href="http://www.dfki.de">German Research Center for AI (DFKI)</a></h3>
<p>
&quot;LingPipe, which is a software package from Alias-i, consists of
several language processing modules: a statistical named entity
recognizer, a heuristic sentence splitter, and a heuristic
within-document co-reference resolution system.  LingPipe comes with
a English language model. The types of NE covered by LingPipe are
locations, persons and organizations. We have re-trained LingPipe so
as to cover more named entities types: DATE for English, and both DATE
and NUMBER for German. We extended the co-reference resolution
algorithm to count for German pronouns as well. A large Gazetteer of
named entity instances has been used for both languages and for
English a PERSON Gazetteer with gender attributes has been integrated
for a better co-reference resolution.&quot;
</p>
<ul><li><span class="smallnote">
G. Neumann and B. Sacaleanu. 2005. <a href="http://www.dfki.de/~neumann/publications/new-ps/34910411.pdf">Experiments on Robust NL Question Interpretation and Multi-layered Document Annotation for a Cross-Language Question/Answering System</a>. Proceedings of <i>CLEF</i>, LNCS 3491. Springer.
</span></li></ul>


<h3><a href="http://uni-hildesheim.de">University of Hildesheim</a></h3>
<p>&quot;LingPipe
was used as a basic tool. Lingpipe applies a statistical machine learning approach to named entity
recognition and categorization. For training LingPipe, we used one annotated corpus for each language: German: Frankfurter Rundschau with 36 Million word forms (Source: Linguistic Data Consortium, LDC); 
English: Reuters News (810.000 news texts)&quot;
</p>
<ul><li><span class="smallnote">
R. Strötgen, T. Mandl and R. Schneider. <a href="http://www.clef-campaign.org/2005/working_notes/workingnotes2005/strotgen05.pdf">A Fast Forward Approach to Cross-lingual Question Answering for English and German</a>.  Proceedings of <i>CLEF</i>.
</span></li></ul>



<h3><a href="http://www.thomson.com/corp/about/mg_lg/ab_mg_lg_overview.jsp">Thomson Legal and Regulatory</a></h3>
<p>
Thomson Legal and Regulatory used LingPipe to do pronoun
resolution for a summarization system they built. LingPipe was not doing
the right thing of the box so they extended it to handle longer
distance anaphora--that is of course a selling point for LingPipe.
</p>
<ul><li><span class="smallnote">
F. Schilder, A. McCulloh, B. T. McInnes and A. Zhou. 2005. <a href="http://www-nlpir.nist.gov/projects/duc/pubs/2005papers/thomson-lr.schilder.pdf">TLR at DUC: Tree similarity</a>. Proceedings of <i>DUC</i>.
</span></li></ul>

<h3><a href="http://www.cam.ac.uk">Cambridge University</a></h3>
<p>
&quot; The Lingpipe NER module achieves high precision by only
generalizing to unseen names in lexical contexts which are clearly indicative
of gene names in the training data.... We
tested the performance of LingPipe on both annotations using standard
definitions of Recall/Precision/F-score achieving 0.8086/0.7485/0.7774 and
0.8423/0.8483/0.8453, respectively.&quot;
</p>
<p>
&quot;Morgan et al, evaluating on the the first set of annotations, reported that
they achieved 0.71/0.78/0.75. Comparing the systems, our performance is
a little better, especially in terms of recall. &quot;
</p>
<p>
&quot;
On unseen tokens, compared to Morgan et al. our performance is
significantly higher (0.619 on merged and 0.5365 on morgan compared to
0.33 F-score), which can be attributed to the treatment of unseen
tokens by LingPipe. &quot;
</p>
<p>
&quot;
For each token classified, we estimated the entropy of the
distribution of Equation 1 computed by LingPipe, which gave us an
indication of how (un)certain the classifier was of its decision. We
observed that many of the recall errors occurred in cases in which the
HMM model classified a token with entropy close to 1, i.e. with high
uncertainty.  We post-processed the output of the classifier by
re-annotating as genes unseen tokens that were classified as ordinary
words with entropy higher than a specified threshold.
&quot;
</p>
<p>
The very satisfying bit of the paper is that they get in the guts
of LingPipe and improve it below; this is why we give it to
researchers.  In return, we helped them integrate LingPipe into
their partially word-annotated XML pipeline.
</p>
<ul><li><span class="smallnote">
A. Vlachos, C. Gasperin, I. Lewin and T. Briscoe. 2006. <a href="http://helix-web.stanford.edu/psb06/vlachos.pdf">Bootstrapping the recognition and anaphoric linking of named entities in <i>drosophila</i> articles</a>. Proceedings of <i>Pacific Symposium on Biocomputing</i>.
</span></li></ul>


<h3><a href="http://www.unsw.edu.au">University of New South Wales</a></h3>
<p>
Used LingPipe for named entity recognition for the geographic part
of the CLEF cross-lingual information retrieval evaluation.
</p>
<ul><li><span class="smallnote">
Y.-H. Hu and L. Ge.  2006.
<a href="http://www.gmat.unsw.edu.au/snap/publications/hu_etal2006b.pdf">UNSW at GeoCLEF 2006</a>. Proceedins of <i>CLEF</i>.
</span></li></ul>


<a name="teaching"></a>
<h2>LingPipe in Education</h2>

<div class="sidebar">
<h2>Bring on the Questions</h2> 
<p>We're happy to field questions from students and faculty either
directly or through the LingPipe mailing list, and are also happy to
suggest appropriate projects at any level from beginning undergraduate
to Ph.D. qualifiers.
</p>
</div>

<div class="sidebar">
<h2>Your Class Here</h2>
<p>
If you know of a class not listed here that's using LingPipe, please
<a href="contact.html">contact us</a>.
Ideally, send us a writeup just like this one and you'll see it in the
next release.
</p>
</div>

<p>
LingPipe is widely used as the basis for assignments or even whole
courses.  These are mostly upper undergraduate and beginning graduate
courses in search, natural language processing or data mining. 
</p>
<p>
The following is a list of some courses we know about.  Please
feel free to submit more.
</p>

<h3><a href="http://www.washington.edu">University of Washington</a></h3>
<ul style="margin-bottom: 1.5em"><li>
<a href="http://faculty.washington.edu/wlewis2/courses/573/573-syllabus.htm">Linguistics 573: Systems/Applications</a>
&nbsp;
(Professor: <a href="http://faculty.washington.edu/wlewis2/">William D. Lewis</a>)
</li></ul>

<h3><a href="http://www.iit.edu">Illinois Institute of Technology</a></h3>
<ul style="margin-bottom: 1.5em"><li>
<a href="http://ir.iit.edu/~dagr/cs522/">CS 522: Data Mining</a>
&nbsp;
(Professor: <a href="http://www.ir.iit.edu/~dagr/">David Grossman</a>)
</li></ul>

<h3><a href="http://uva.nl">University of Amsteram</a></h3>
<p>
A project-oriented masters course that used LingPipe as a significant
component.  They produced an interesting <a href="http://www.ifarm.nl/erikt/ltp2006/ltp2006.pdf">system report</a>, as well.
</p>
<ul><li>
<a href="http://www.ifarm.nl/erikt/ltp2006/">Language Technology Project</a>
&nbsp;
(Professor: <a href="http://www.ifarm.nl/erikt/">Erik Tjong Kim Sang</a>)
</li></ul>

</div><!-- content -->



<div id="foot">
<p>
&#169; 2003&ndash;2011 &nbsp;
<a href="mailto:lingpipe@alias-i.com">alias-i</a>
</p>
</div>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-15123726-1");
pageTracker._trackPageview();
} catch(err) {}</script></body>
</html>


